149 research outputs found

    Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

    Full text link
    Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms. Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs

    Get PDF
    BACKGROUND: We describe the development, validation, and use of a highly redundant 120,000 oligonucleotide microarray (MuscleChip) containing 4,601 probe sets representing 1,150 known genes expressed in muscle and 2,075 EST clusters from a non-normalized subtracted muscle EST sequencing project (28,074 EST sequences). This set included 369 novel EST clusters showing no match to previously characterized proteins in any database. Each probe set was designed to contain 20–32 25 mer oligonucleotides (10–16 paired perfect match and mismatch probe pairs per gene), with each probe evaluated for hybridization kinetics (Tm) and similarity to other sequences. The 120,000 oligonucleotides were synthesized by photolithography and light-activated chemistry on each microarray. RESULTS: Hybridization of human muscle cRNAs to this MuscleChip (33 samples) showed a correlation of 0.6 between the number of ESTs sequenced in each cluster and hybridization intensity. Out of 369 novel EST clusters not showing any similarity to previously characterized proteins, we focused on 250 EST clusters that were represented by robust probe sets on the MuscleChip fulfilling all stringent rules. 102 (41%) were found to be consistently "present" by analysis of hybridization to human muscle RNA, of which 40 ESTs (39%) could be genome anchored to potential transcription units in the human genome sequence. 19 ESTs of the 40 ESTs were furthermore computer-predicted as exons by one or more than three gene identification algorithms. CONCLUSION: Our analysis found 40 transcriptionally validated, genome-anchored novel EST clusters to be expressed in human muscle. As most of these ESTs were low copy clusters (duplex and triplex) in the original 28,000 EST project, the identification of these as significantly expressed is a robust validation of the transcript units that permits subsequent focus on the novel proteins encoded by these genes

    The genetic architecture of type 2 diabetes

    Get PDF
    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes

    Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci

    Get PDF
    We performed fine-mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in/near KCNQ1. “Credible sets” of variants most likely to drive each distinct signal mapped predominantly to non-coding sequence, implying that T2D association is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine-mapping implicated rs10830963 as driving T2D association. We confirmed that this T2D-risk allele increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D-risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease

    Therapeutic targets for HIV-1 infection in the host proteome

    Get PDF
    BACKGROUND: Despite the success of HAART, patients often stop treatment due to the inception of side effects. Furthermore, viral resistance often develops, making one or more of the drugs ineffective. Identification of novel targets for therapy that may not develop resistance is sorely needed. Therefore, to identify cellular proteins that may be up-regulated in HIV infection and play a role in infection, we analyzed the effects of Tat on cellular gene expression during various phases of the cell cycle. RESULTS: SOM and k-means clustering analyses revealed a dramatic alteration in transcriptional activity at the G1/S checkpoint. Tat regulates the expression of a variety of gene ontologies, including DNA-binding proteins, receptors, and membrane proteins. Using siRNA to knock down expression of several gene targets, we show that an Oct1/2 binding protein, an HIV Rev binding protein, cyclin A, and PPGB, a cathepsin that binds NA, are important for viral replication following induction from latency and de novo infection of PBMCs. CONCLUSION: Based on exhaustive and stringent data analysis, we have compiled a list of gene products that may serve as potential therapeutic targets for the inhibition of HIV-1 replication. Several genes have been established as important for HIV-1 infection and replication, including Pou2AF1 (OBF-1), complement factor H related 3, CD4 receptor, ICAM-1, NA, and cyclin A1. There were also several genes whose role in relation to HIV-1 infection have not been established and may also be novel and efficacious therapeutic targets and thus necessitate further study. Importantly, targeting certain cellular protein kinases, receptors, membrane proteins, and/or cytokines/chemokines may result in adverse effects. If there is the presence of two or more proteins with similar functions, where only one protein is critical for HIV-1 transcription, and thus, targeted, we may decrease the chance of developing treatments with negative side effects

    Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

    Get PDF
    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (\u3e80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D

    Independent test assessment using the extreme value distribution theory

    Get PDF
    The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naĂŻve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies

    Population Bottlenecks as a Potential Major Shaping Force of Human Genome Architecture

    Get PDF
    The modern synthetic view of human evolution proposes that the fixation of novel mutations is driven by the balance among selective advantage, selective disadvantage, and genetic drift. When considering the global architecture of the human genome, the same model can be applied to understanding the rapid acquisition and proliferation of exogenous DNA. To explore the evolutionary forces that might have morphed human genome architecture, we investigated the origin, composition, and functional potential of numts (nuclear mitochondrial pseudogenes), partial copies of the mitochondrial genome found abundantly in chromosomal DNA. Our data indicate that these elements are unlikely to be advantageous, since they possess no gross positional, transcriptional, or translational features that might indicate beneficial functionality subsequent to integration. Using sequence analysis and fossil dating, we also show a probable burst of integration of numts in the primate lineage that centers on the prosimian–anthropoid split, mimics closely the temporal distribution of Alu and processed pseudogene acquisition, and coincides with the major climatic change at the Paleocene–Eocene boundary. We therefore propose a model according to which the gross architecture and repeat distribution of the human genome can be largely accounted for by a population bottleneck early in the anthropoid lineage and subsequent effectively neutral fixation of repetitive DNA, rather than positive selection or unusual insertion pressures

    Independent test assessment using the extreme value distribution theory

    Full text link
    Abstract The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.http://deepblue.lib.umich.edu/bitstream/2027.42/134747/1/12919_2016_Article_38.pd
    • …
    corecore